Web Scraping

Web scraping is a technique of extracting information from websites.

Scraperwiki

http://scraperwiki.com

Scraperwiki has tutorials on scraping webpages for data, written for Python and Ruby.

Scraper: a Plug-in for Chrome

Scraper is a cool, chrome plug-in I've just discovered that makes scraping web pages easy. Just
  1. Highlight part of a table, at least a row, that you want to scrape.
  2. Right-click on the selection. Select "scrape similar" from the pop-up menu, and some reasonable scraping defaults will appear.
  3. Press the "Export to Google Docs.." button to save the scraped data to a google docs spreadsheet.

Google Refine

Use to clean up messy and inconsistent data

Chrome Developer Tools

use to see the DOM underlying web pages

If there is a table of data on a web page that you want to scrape, select it with your mouse, right click on the selection and choose inspect element in the pop up menu. This should work in Safari, Chrome or Firefox with the Firebug plug-in.

Data Analysis




Published

28 January 2012

Tags